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neous environment is of critical importance in agroecology and conservation biology as it can 
provide management tools to limit the effects of pests or to increase the survival of endangered 
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diffusion parameters of spatially-explicit models based on stochastic differential equations, us¬ 
ing genetic data. Dividing the total population into subpopulations corresponding to different 
habitat patches with known allele frequencies, the expected proportions of individuals from 
each subpopulation at each position is computed by solving a system of reaction-diffusion equa¬ 
tions. Modelling the capture and genotyping of the individuals with a statistical approach, we 
derive a numerically tractable formula for the likelihood function associated with the diffusion 
parameters. 

In a simulated environment made of three types of regions, each associated with a differ¬ 
ent diffusion coefficient, we successfully estimate the diffusion parameters with a maximum- 
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1 Introduction 


Dispersal is one of the main forces driving population redistribution (Turchin]|1998), gene flow 


(Slatkin 1987 Bohonak 19991 and genetic diversity (Hewitt 2000 Roques et al. 2012 2014). 


Dispersal directly affects population flows between different spatial positions. Having a precise 
knowledge of these flows is of critical importance in agroecology and conservation biology, as it 
can provide management tools to limit the effects of pests (Gilligan 2008 Papai'x et al.||20lT | 
or to increase the survival of endangered species (Hanski and Gilpin 1996) by acting on the 
landscape structure. 

In heterogeneous environments, dispersal is often influenced by local landscape features. 
This local effect of the landscape on mobility can be captured by most spatially-explicit models 
at the scale of individual movement, such as in random walk models and stochastic differential 


equations ( 

Preisler et al. 

2004 

Smouse et al. 

2010 

) or at the scale of population density in 

reaction-diffusion models 

Shigesada and Kawasaki|1997 

Cantrell and Cosner|2003 

Ovaskainen 


Traditionally, Euclidian distances and least-cost distances have been used, e.g. in models 


of isolation by distance (Wright 1943 Rousset 1997 Broquet et al. 2006), to quantify the 
population flows. However, these approaches have several drawbacks; in particular, least-cost 
distances are based on a subjective definition of a cost function and assume a single and optimal 


migration path. Resistance-based approaches are more realistic (McRae 2006 Graves et al. 


2014), as the resistance distance is computed in a similar manner as in an electrical network, 


where all possible paths are taken into account. However, the computation of resistance distances 
is time-consuming and generally does not allow for a precise estimation of the local resistance 
parameter, e.g. by maximum likelihood, but rather to test a limited set of conjectured resistance 
values ( Graves et al.|2013 ), leading again to a subjective parametrization of the local resistance 
values. Additionally, although there exist random walk interpretations of the effective resistance 
in electrical networks (Doyle and Snell||1984 1, these approaches are not based on a mechanistic 


description of the spatio-temporal dynamics of a population (see Theorems 1 and 2 in Tetali 


1991), and therefore do not directly quantify population flows. 


The method developed in this paper enables direct and fast estimations of a spatially- 
heterogeneous parameter D(x) measuring the local mobility of individuals at the space position 
X, in mechanistic models based on stochastic differential equations. More precisely, we assumed 
that the individual trajectories followed Itb diffusion processes, corresponding to uncorrelated 
random walks with spatially-varying speed. This framework is widely used for analyzing move¬ 
ment, see Preisler et al. (2004); Smouse et al. ( 2010| and the references therein. In this frame¬ 
work, the expected population density at any time and space position can be computed using 


the corresponding Fokker-Planck partial differential equation (Gardiner 2009) with diffusion 


parameters D{x). Here, we also took into account death events occurring at exponentially dis¬ 
tributed times, leading to a reaction-diffusion description of the population density. Thanks to 
a well-developed theory for their numerical analysis and efficient softwares (e.g., freefem-l—h, 
see Hecht (2012), or Gomsol Multiphysics®), the numerical computation of the solutions of 


such reaction-diffusion equations is fast and reliable, even in the presence of heterogeneous co¬ 


efficients. This makes them ideally suited for parameter estimation (Soubeyrand and Roques 


2014). 


Several types of data can be used for the estimation of dispersal parameters in population 


models. Most studies bear on abundance data (Roques et al. 2011) or on mark-recapture ex¬ 
periments ( Turchin]|1998 Ovaskainen et al.]|2008 ), where individuals are marked with different 
technics, such as color markers or radioactive isotopes (Southwood and Henderson 2009). As 


opposed to this marking experiments, passive surveys of the relative abundance of genetic mark¬ 
ers that are naturally present in a population lead to spatio-temporal data which can be easier 
to obtain and more informative ( Robledo- Arnuncl^|2012 ). Here, we considered the dispersal of 
individuals starting from several habitats and which eventually died, defining the end of the 
dispersal period. The estimation of the parameter D{x) was based on measurements of the 
genotypes of individuals captured at several positions during the dispersal process, the location 
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of the habitats and the allele frequencies before dispersal in these sites being considered as 
known. Although numerous methods have been developed in landscape genetics to estimate 
dispersal from molecular markers ( Hamrick and TrapnellpOll ) we are not aware of any article 
estimating diffusion parameters from such data. 


The method is based on a mechanistic-statistical approach ( 

Berliner 

2003 Wikle 

2003 

Ovaskainen et al.||20081 |Soubeyrand et al.||20091 |Roques et al.||20111 Soubeyrand and Roques 

20141 included in the framework of state-space models (Patterson et al. 

2008 Durbin and 


Koopman|2012 ). This approach typically combines a mechanistic model describing the dynam¬ 


ics under investigation with a statistical model conditional on the dynamics, describing how the 
measurements have been collected, bridging the gap between the data and the model for the 
dynamics. In the mechanistic part of our model, we divided the total population into subpopu¬ 
lations at Hardy-Weinberg equilibrium, each one corresponding to the individuals coming from 
a different habitat patch. The dynamics of the different subpopulations were then described by 


a system of reaction-diffusion equations, as in Roques et al. (20121. Given a diffusion coefficient 
D{x), this allowed us to compute the expected proportions of individuals from each subpopu¬ 
lation at each space position. Conversely, the genotype data contain information about these 
proportions; namely, the probability to observe a given genotype at some trapping location 
depends on the respective contributions of each subpopulation and on the allele frequencies in 
these subpopulations. Modelling the capture of the individuals with a statistical approach, and 
using the Hardy-Weinberg equilibrium assumption in each subpopulation as a key ingredient, 
it was therefore possible to derive a numerically tractable formula for the likelihood function 
associated with the diffusion parameters D(x), given the genotypes of the captured individuals. 


Note: a summary of the notations used throughout this paper is provided in Table^ 


Notation 

Explanation 

u(t, x) 

density of dispersers 

u^(t, x) 

density of dispersers coming from habitat 

Uo(x) 

pre-dispersal density 

Uq{x) 

pre-dispersal density of individuals coming from habitat 

a 

pre-dispersal density in the habitats 

Woo(x) 

cumulated density of dispersers 


cumulated density of dispersers coming from habitat 

I^T 

capture rate in trap Or 

Cr 

expected number of individuals captured in trap Or 


expected number of individuals coming from habitat 
captured in trap Or 

D{x) and Di, D 2 , -D 3 

diffusion parameters (mobility) 

V 

life expectancy of the dispersers 

n 

study site 

II 

..c 

habitats (subsets of i7) 

0T, T = 1, ... ,J 

traps (subsets of f?) 

Xh 

position of the center of the habitat 

Xr 

position of the center of the trap Or 

G 

number of individuals genotyped in each trap 

\ = 1,...,A 

index for the loci 

a = 1 ,... 

index for the alleles 

[a^ , a^) 

couple of alleles at a given locus 

PhXa 

frequency of allele a of locus A in habitat 

^hX 

allele frequencies at locus A in habitat 

Qir 

genotype of the genotyped individual in trap Or 

M 

measurement set consisting of all the genotypes in all traps 


Table 1 Summary of the notations used in the main text. 
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2 Models 


Modelling dispersal and death. We begin with a Lagrangian description of the individual move¬ 
ments. We assumed that the positions of the individuals followed 2-dimensional space- 
heterogeneous Ito diffusion processes, corresponding to uncorrelated random walks. This means 


that the individuals travel at random, with no drift in any particular direction (Preisler et al. 
2004 Smouse et al.||2010 ). However, the mobility of the individuals can be influenced by their 


position. The corresponding stochastic differential equation for the position Xt G 
individual at time t can be written: 


of an 


dXt = ^j2D{Xt)dWt, 


( 2 . 1 ) 


where Wt is the 2—dimensional Wiener process (Brownian motion). The coefficient D{Xt) is 
called the diffusion coefficient. With this model, in a small time interval of length t, each 
coordinate of Xt is incremented by a normally distributed value with mean 0 and variance 
2tD. Thus, D{x) is a measure of the local mobility of the individuals. When D is constant. 


the stochastic differential equation (2.1) corresponds to the standard Brownian motion. 


We also assumed that each individual had a life expectancy i' > 0, the death events be¬ 
ing independent and identically distributed and modelled by exponential distributions with 
parameter l/iy. 

Under these assumptions, we can switch to an Eulerian description of the population. The 
expected population density u(t, x) at time t and position x, starting from an initial distribution 
Uq satisfies the following Fokker-Planck reaction-diffusion equation (see e.g. Gardiner||2009 1: 


^ u)--, t > 0, X e 17, 

ot V 

u(0,x) = Uo{x), 


( 2 . 2 ) 


where A = is the 2-dimensional Laplace diffusion operator. The set 17 C is the 

study region. For this equation to be well-posed, some conditions on the boundary 917 of 17 have 
to be specified. In the computations of this paper, we assumed absorbing conditions (u(f, x) = 0 
on dfl). Reflecting conditions {X{D{x)u{t,x)) ■ n(x) = 0 on 9l7, where n(x) is the outward 
normal to 17) could have been assumed as well. 

The quantity uq is called the pre-dispersal density. It corresponds to the density of individu¬ 
als at a motionless stage, such as eggs, pupae or larvas in insect populations or immature seeds 
in plants. The quantity u{t, x) is the (expected) density of dispersers, e.g. insects at the adult 
stage or dispersing seeds. We assumed a zero pre-dispersal density outside some known disjoint 
subsets (habitats) 17^, h = 1,... ,H, where uq is positive and constant (equal to a > 0): 


H 

uo{x) = ^at, 

h^l 


for all X € 


(2.3) 


where is the characteristic function of the set 17^ : it takes the value 1 in 17^ and 0 

anywhere else. A possible gradual release of the pre-dispersal populations could also be assumed 


by considering a slightly modified version of the equation (2.2), see Appendix A. 


Modelling the capture of individuals from different sources. We considered the case of non- 
attractive traps, corresponding to disjoint sets 9^, t = 1,..., J in the study site. We assumed 
that the expected number of individuals Cr captured in a trap 9^ was proportional to the 
cumulated population in 9t. : 

poo 

Woo{x)dx, with ryoo(a;) = / u{t,x)dt, (2.4) 

Jo 

and Pr the capture rate (number of captured individuals per unit of time per unit of area in the 
trap 9r). Note that u{t, x) < e~^l'^ maxuo; this means that the value of Wao{x) can be precisely 
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approached by computing the above integral over a finite interval. For sufficiently small traps, 
Wooix) can be considered constant in 0^, which leads to: 

Cr = PrldrlWooiXr), (2.5) 

where Xr is the location of the center of the trap 9r, and \9r \ is the trap area. 


Remark 1 With this approach, the trapping process has no influence on the species dynamics, 
i.e., the trapped individuals are not removed from the system. To avoid this lack of realism, a 
sink term —/Srulx^e^ could be added to the right-hand side of (2.2). For the sake of simplicity, 
we assumed that the traps were small enough to consider that this term could be neglected. 


Consider now the density u^{t,x) of individuals coming from a given habitat 17^. Since all 
the individuals are supposed to share the same dispersal and death characteristics independently 
of their origin, the densities u^{t,x) satisfy 

7/^ 

= A{D{x) u^) -, t > 0, X G 17, (2.6) 

at V 

and 

u^{0,x) = Uq{x) = alxGi?'*) for all x G 17, (2.7) 

with the same boundary conditions as the total population u. The dynamics of the different 
fractions of the total population u is therefore described by a system of H decoupled reaction- 
diffusion equations. Summing up all these equations, it can be checked that for all t, x. 


(Roques et al.||2012): 


H 

u{t, x) = U^{t, x). 

h=l 


The expected number of individuals coming from a habitat 17^ and which are captured in a 
trap 9r is then given by: 

pOO 

Cx = Priori w^^^Xr), with (x) = / u^{t,x)dt. (2.8) 

Jo 

We assumed that the population was large enough so that the number of captured individuals 
was larger than a constant G in all traps. This constant corresponds to the number of individuals 
genotyped in each trap. 


3 Parameters and data 

Our goal was to estimate the diffusion parameters D(x) for all x G 17. Other unknown parame¬ 
ters are the pre-dispersal density in the habitats (a) and the capture rates in the different traps 
(/3r). 

We assumed that the global population before dispersal, Ug, was organised into several sub¬ 
populations each of which was at Hardy-Weinberg equilibrium and linkage equilibrium among 
loci. For the sake of clarity, we assumed that there were exactly H subpopulations, with den¬ 
sities Ug = each one corresponding to a habitat 17^. More complex assumptions are 

also possible, see Remark 

The positions of the habitats 17^ and of the traps 9r were known. For each subpopulation h 
and each locus A (e.g. microsatellites) out of A loci, the pre-dispersal frequencies of Ax alleles 
were known and designated as: 

J^hX = iPh\a)a=l,...,Ax- (3.9) 

The individuals captured in 9x were genotyped at the same A loci. These individuals were 
assumed to be diploid; thus, each genotype was described by: 

0 = {(ai,ai)};,=i^„.,^- 
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4 Computation of the likelihood 

The computation of genotype likelihoods builds on a combination of classical genetic assignment 
studies ( Paetkau et al.||1995 Pritchard et al.||2000 | and seed dispersal analyses from trap data 
( Robledo-Arnuncio and Garcia||2007 Klein et al.||2013 |. 

Among the individuals captured in a trap 0i-, G individuals have been genotyped. This led 
to G genotypes Qir, i = 1,... ,G. 

The conditional probability that an individual i carries alleles (a^,a^) G {1 ,...,Aa}^ at 
locus A, given that this individual comes from a habitat can be deduced from the allele 
frequencies in subpopulation h (see Section]^. The two alleles being independent, which follows 
from the Hardy-Weinberg equilibrium assumption in the subpopulation Uq, we get: 


P((a\a^)|f2^) = 2'^^phXaiPh\a^, 


(4.10) 


where = 0 if the individual is homozygous at locus A (a^ = and = 1 otherwise. Using 
the linkage equilibrium assumption among loci, we get the conditional probability of genotype 

Gir- 




(4.11) 


A=1 


where ki is the number of heterozygous loci in the genotype Gir ■ 


Remark 2 For the sake of simplicity, the dependence of and a? with respect to the locus A 
and the individual i have been dropped in our notations. For instance, in formula (4.11), a} 
and may designate different alleles, depending on the locus A and on the individual i. 

The law of total probability leads to: 


H 


V{G„) = 2 '=* 5 ] 


h^l 


n 

.A=l 


PhXa^ PhXa^ 


P(indiv. i comes from Q^). 


(4.12) 


We have seen in Sectionj^that the expected number of individuals trapped in Or was given by 
Cr, and the expected number of individuals coming from a habitat 1?^ and which are captured 
in a trap Or was given by Let us denote by Ir the number of individuals coming from 17^ 
and captured in Or, and let us set 


Ir = Y.P^G, 


(4.13) 


h=l 


the total number of individuals captured in Or- Assume that follows a Poisson distribution: 

~P(C'(f). (4.14) 

Thus, Ir also follows a Poisson distribution with parameter 


Gr=Y.Gr- 


h=l 


It can be verified that the conditional distribution of 1^ given Ir satisfies a binomial distribution 
with parameters G and G^ jGr- Thus, the conditional expectation of the proportion Ir jIr given 
Ir is Gr jGr, which is independent of G- Finally, this shows that: 

The genotyping process corresponds to a sampling without replacement, this means that the 
number of genotyped individuals coming from habitat 17^ follows a multivariate hypergeometric 
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distribution with parameters It, It and G. For large values of It, this distribution converges to 
the multinomial distribution with parameters G and {I^/It, ■ ■ ■, /It)- Using this multinomial 

distribution, we can compute the probability that a genotyped individual i trapped in 9t comes 
from a habitat h: 

f I^\ 

P(indiv. i comes from 17^) = E f (4-15) 

The hypergeometric distribution would lead to the same formula, but the advantage of the 
multinomial distribution is that it guarantees an independence assumption between the indi¬ 
viduals trapped at a same location. 

Assuming that the trapping and genotyping processes are independent, and using formulas 


(4.12) and (4.15), we finally get the likelihood function associated with the unknown parameter 


D, given the genotypes Qi- 


c{D)= n n nQ^r\D) 

= n n p(0,.iava) 


(4.16) 


= ‘2^ u n £ 

i—l,...,G h—1 


Ch ^ 

n Ph\a^ PhXa^ 

^ A=1 


where k is the total number of heterozygous loci in the genotyped population. Coming back to 
the definitions (2.5) and (2.8) of Gt and Gt, we can compute the ratio: 


^ w^jxT) 
Gt a^oo(:^r) 


(4.17) 


This shows that Ct/Ct is independent of the capture rates Pt- From the linearity of the 
equations (2.2) and (2.6), it follows that and Woo are proportional to a which means 


that the ratio G^/Gt is also independent of the choice of a. Thus, the likelihood £(Z1) can 
be computed with an arbitrary choice of parameters, e.g. (x,^t = 1- Note that if the source 
intensity a. was spatially variable, it would not simplify in the expression Gt /Gt- In such case, 
a should be estimated in each source, which is only possible up to a multiplicative constant 
since C)? would be proportional to the value of a in each habitat h. 


Remark 3 We recall that uq was organised into several subpopulations at Hardy-Weinberg 
equilibrium. Here, we assumed that, before dispersal, these subpopulations coincided with the 
habitats, which were disjoint. A first easy generalization would be to consider several subpopu¬ 
lations in the same habitat. Assume that there are R subpopulations with pre-dispersal densities 

H 

ul{x)= E and that the allele frequencies are known in these subpopulations. 

h=l 

In this case, the probability P(5iT-|f7^) does not satisfy formula (4.11) but can be computed as 
follows: 

R 

lP(0ir|I?^) = P(flzTIindiv. i comes from subpop. r), (4-18) 

r—1 


which leads to: 


R A 

hr 




PrXa^ PrXa^ • 


(4.19) 


A=1 


5 Numerical computations 

The aim of this section was to validate the maximum likelihood estimator of Section |4] on a 
simulated data set. 
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Fig. 1 Study site Q = (0,1) x (0,1) with six habitats of centers for h = 1,...,6. The grey region 
Q corresponds to a barrier to dispersal. Blue crosses correspond to the positions of the first 10 traps 0^-, r = 
1,.. ., 10; red crosses correspond to the positions of a supplementary set of ten additional traps 9t-,t = 11 ,..., 20. 


5.1 Simulated data set 


Landscape. The study site f2 was a unit square (0,1) x (0,1) containing H = 6 habitats 1?^, 
described by balls Bn^Xh) of radius R = 0.05 and centers Xh & and a rectangular region 
Q = {q — R/2,q + R/2) x (0,1) modelling a barrier to dispersal {q = 0.5). The rest of the study 
site was considered as the matrix (Fig[^. 

The general idea was to consider diffusion parameters Di in the matrix, D 2 = Di/2 in 
the habitats and D 3 = Di/10 in the barrier. However, for the well-posedness of the reaction- 
diffusion equations for u and u^, the coefficient D{x) had to be positive and smooth. We thus 
defined the heterogeneous diffusion parameters D{x) as: 

D{x) = exp -f d 2 '^^(l){x - Xh) + ds-ipix)^ , (5.20) 

for smooth positive functions (j) and such that 4 >{x — Xh) was compactly supported in B 2 R{xh) 
for any h = 1,..., S' and ip was compactly supported in {q — R, q + R) x (0,1) and maxip = 
max'!/; = 1. The precise shape of (p and ip is detailed in Appendix B. The numerical values of 
di, d 2 and ds that we used in our computations were: 

di = log(O.Ol), d 2 = -log(2), and ds = -log(lO). (5-21) 

With this framework, the diffusion parameters were equal to Di = 10“^ in the matrix, far 
from the habitats and from the barrier, to D 2 = Di /2 at the center of the habitats and to 
D 3 = Di/10 at the center of the barrier. See Remark]^ for some comments on these parameter 
values. 


Remark 4 Assuming that the unit square corresponds to a 10km x 10km region, and that the 
unit of time is one day, a diffusion parameter D = 10“^ corresponds to Ikm^/day. Using the 
formula (see e.g. Turchin|[l998[ |Roques||2013 ): 


D = 


(length of a straigth line move during one time step)^ 


4 X duration of the time step 
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(d) h. = 4 (e) = 5 (f) h. = 6 


Fig. 2 Pro babilit y w^{x)/woo{x) that an individual trapped at the position x £ H comes from the habitat 
see formula ( |4.17[ >. White regions indicate that the probability is smaller than 0.005. The value of rc5Q(a^)/inoo(a;) 
has been computed at t = 20, with the true parameter values (see Appendix D). 


for random walk movements with one direction change per minute this value of the diffusion 
parameter corresponds to a flying speed of 53m per minute. Under the same assumptions, 
D = 10“^/2 and D = 10“^/10 correspond to flying speeds of 37m and 17m per minute, 
respectively. See Kareiva (19831 for reference values of diffusion parameters of several insect 
species. 


Allele frequencies. The number of loci and the number of alleles at each locus were Tl = 10 
and A = 10. Following Pritchard et al. (2000), we drew the allele frequencies in a flat Dirichlet 
distribution, with concentration parameter q in each subpopulation h and for each locus A : 

Xsx V{q). (5.22) 


Values of q larger than 1 lead to evenly distributed frequencies, whereas smaller values of q lead 
to distributions which are concentrated on a few components. In our simulations, we adjusted 
the value of q such that the fixation index Fst among subpopulations (see Appendix C) was 
close to 0.01, 0.05 or 0.1 (Table [^, with a relative tolerance of 0.1%. 


Simulation of the measured data. We solved the reaction-diffusion models for the cumulated 
population densities Woo and for h = 1,...,6 with the true diffusion parameter values 
(5.21) and ix = 5 for the life expectancy of the dispersers (see Appendix D). The proportions 
{Cf /Cr)h=i,....& have been computed using formulas ( |2.5[ ) and (2.8). We recall that, from formu¬ 
las (4.15) and (4.17), C^jCr = w^{xr)/W oo{Xt) is the probability that an individual trapped 
in 9r comes from the habitat f2^. The probability w’f^{x)/wao{x) can be computed at each point 
in 17; it is depicted in Fig. Movies of the dynamics of the probability w^{x)/wt{x) that an 
individual trapped in 9^ between the times 0 and t comes from the habitat 17^ are available as 
supplementary materials. 

In each trap 9r (see Fig.for the locations of the traps), the numbers of genotyped individ¬ 
uals coming from the habitats 17^ followed a multinomial distribution with parameters G and 
(C'^/Ct);i=i,,.., 6. We tested the effect of the number of traps by using J = 10 or 20 traps and of 
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Fig. 3 Five sections of the 3-dimensional log-likelihood function ln(£(di, d 2 , da)). The color scale corresponds 
to the gap between the log-likelihood and the maximum log-likelihood, where the maximum is taken over the 
parameter region (di, d 2 , da) E (di — 5, di -h 5) x (d 2 — 5, d 2 -h 5) x (da — 5, da -|- 5). 


the total number of genotyped individuals by using J x G = 500 or 2000 genotyped individuals 
(Table [|). 

For each genotyped individual i, the genotype Qir was randomly drawn according to the 
allele frequencies Thx in its habitat of origin. The simulated observations consisted in the 
genotypes Qir, i.e., the set: 


M = {g„, i = l,...,G, r = l,...,J}. 


(5.23) 


For each of the 12 sets of parameters {Fst, J,JG), we simulated 70 data sets Ai. For each 
data set, the estimator (di,d 2 ,d 3 ) of {di,d 2 ,d^) has been obtained by minimizing — log(£(Il)) 
(see formula (4.16)). The minimization was performed using the Matlab® constrained gradient- 
based minimization algorithm fmincon, with the constraint 


(di, d 2 , da) S (di — 5, di -I- 5) x (d 2 — 5, d 2 -f 5) x [d^ — 5, d 3 -I- 5). 


(5.24) 


In the computation of — log(£(i7)), the numerical evaluation of the quantities {C!^/Cr)h=i,...,6 
was based on the finite element method as described in Appendix D. The average computation 
time for one estimation was about 45min with a dual Core Intel® processor, while the compu¬ 
tation of the likelihood took about 5sec. The Bayesian method ( Marin and Robert|2007 ) would 
be more computationally intensive, but it could also be used to compute a posterior distribution 
of D{x) at each position x. 


5.2 Results 


The typical profile of the log of the likelihood function (4.16) is depicted in Fig for one 
simulation with Fst = 0.05, J = 20 traps and G J = 2000 individuals. The likelihood tends to 
decay as {di,d 2 ,d^) diverges from the true value (di,d 2 ,d 3 ), suggesting an efficient parameter 


estimation by maximum likelihood, even with the constraint (5.24) was relaxed. 

The direct analysis of the quality of the estimator {di,d 2 ,d'i) is presented in Table in 
terms of the Fst value, the number of traps and the number of genotyped individuals. In this 
section, we focus on the more biologically meaningful estimators 


Di = exp 


(di) , 1)2 = exp (di -k d-^ 


£>3 = exp 


[di + 4) 


which can be directly compared to the values Di, 172, of the diffusion parameter in each of 
the three regions (matrix far from the other regions, center of the habitats and center of the 
barrier, respectively). 

For each Fst value (0.01, 0.05 and 0.1), with J = 20 traps and JG = 2000 genotyped 
individuals, we observe in Fig. |^that the median of D is correctly centered on the true value of 
D in each of the three regions (see Fig. [^and Section [ aT] for the definition of the three regions). 
We also note that there are no outliers among the estimated values of D in the matrix (the 
largest region in Fig. for all Fst values. Outliers far from the true value of D (about 12 
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Fig. 4 Effect of the on the quality of the estimator D{x) in each of the three regions: Di in the matrix 
(in white), D 2 in the habitats (in green) and D 3 in the barrier (in grey). The blue dashed lines correspond to 
the true values of D in the three regions: Di = 0.01, D 2 = 0.005 and D 3 = 0.001. 


times the true value) appear in the other regions for small Fst values (Fst= 0.01). For larger 
Fst values, the number of outliers is reduced and they are closer to the true value (2 outliers 
in the habitats and the barrier for Fst=0-05 and only 2 outliers in the habitats region for 
^ST=0.1). In the three regions (matrix/habitats/barrier), the interquartile ranges are reduced 
as the Fst is increased from 0.01 to 0.05; this decrease of the interquartile ranges by a factor 
2 in average indicates a strong effect of the Fst on the accuracy of the estimation. There is no 
clear difference between the interquartile ranges corresponding to Fst = 0.05 and Fst = 0.1; 
however, the bias and the standard deviation of (di, ^ 2 , da) are still improved when the Fst is 
increased from 0.05 to Fst = 0.1 (see Table[^. 


Fst 

fttraps J 

Jindiv. J G 

Bias (% true value) 

Std dev (% true value) 

0.01 

20 

2000 

(-0.2, 26.2, -3.9) 

(3.2, 135.4, 65.5) 

0.05 

20 

2000 

(-0.1, 5.7, 0.9) 

(1.7, 53.1, 33.7) 

0.1 

20 

2000 

(-0.1, 0.9, 0.3) 

(1.6, 49.4, 15.0) 

0.1 

20 

500 

(0.0, 15.1, -10.3) 

(3.0, 114.9, 38.6) 

0.1 

10 

2000 

(-0.1, 2.1, -5.7) 

(2.0, 36.8, 36.0) 

0.1 

10 

500 

(0.4, -10.6, -4.4) 

(4.4, 75.3, 61.2) 


Table 2 Effect of the Fg'p , the number of traps and of the total number of genotyped individuals on the quality 
of the estimator {di,d 2 ,ds). 


To better quantify the power of our genetic system to discriminate between different habitats 
of origin, depending on the Fst, we computed the average posterior probability that an individ¬ 
ual comes from its true habitat, say , given its genotype Q, that is: P(i comes from |^). 
Using Bayes theorem and considering that all of the H habitats are equally likely a priori, we 
get: 


P(i comes from 1?^ |^) 


P(^|i comes from 17^ )/H 

~H 

P(f/|* comes from 

h=l 


(5.25) 


For 300 Fst values between 0 and 0.1, we simulated 1000 data sets (allele frequencies in the id 
sites, see Section 5.1), we sampled 1000 genotypes G in one of the sites, and we averaged the 
quantity P(i comes from 17^ |^) over the 1000 individuals. We call the obtained quantity the 
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Fig. 5 Discrimination power: average posterior probability that an individual comes from its true habitat, given 
its genotype Q, in terms of the Fst- Left: effect of the number of loci A and of the number of alleles per locus 
A. Right: effect of the number of habitats H. In both figures the red curve corresponds to the parameters that 
have been used in this study. 


discrimination power of our genetic system. As expected, increasing the Fst leads to higher 
discrimination power (Fig.j^. With our parameters (iJ = 6, A = 10, A = 10), the discrimination 
power is about 0.5 for an Fst of 0.01; it reaches 0.9 for an Fst of 0.05 and 0.99 for an Fst of 0.1. 
This explains the gap between the quality of the estimators obtained with the Fst values larger 
than 0.05, compared to the case Fst = 0.01. This could also explain the small difference in the 
accuracy of the estimation when the Fst is increased to 0.1, compared to the case Fst = 0.05. 
Additionally, we note that it does not seem necessary to go beyond the value Fs7’=0.1 to get a 
very high discrimination power. This also means that the variability in our estimator D, when 
Fst=^A, is not due to an uncertainty on the habitat of origin of the genotyped individuals, 
but rather to the sampling variability due to the limited number of these individuals. 

Fig. [5] also shows the effect of the number of habitats, the number of loci and the number 
of alleles. Increasing the number of alleles per locus or the number of loci have a comparable 
effect, with in both cases an increase in the discrimination power. For Fst values close to 0, the 
discrimination power converges to 1 /i?, meaning that very small genetic differentiation between 
the subpopulations leads to equiprobable habitat of origin. In such case, the data contain no 
information on the origin of the trapped individuals, and the estimation of D is therefore not 
possible. For larger values of the Fst^ increasing the number of sites always leads to lower 
discrimination power. 

The results in Fig. [^show, for Fst = O.I, the effect of the number of traps and of the total 
number of genotyped individuals on the quality of the estimator. In all cases, the median of D 
is close to the true value, in each of the three regions. For a fixed total number of genotyped 
individuals, increasing the number of traps J from 10 to 20 does not increase significantly the 
quality of the estimator. It can even decrease it in some situations, e.g., the interquartile range 
is about twice larger with 20 traps than with 10 traps in the habitats. A possible explanation 
for this counterintuitive effect is that, when the number of traps is increased, the number G of 
individuals genotyped per trap decreases; in the regions where no traps are added (here, the 
habitats, see Fig. [^, this can lead to more uncertainty on the estimator. 

The results of Fig. [^confirm that increasing the number of genotyped individuals leads to 
far better estimations: in both cases J = 10 and J = 20, the interquartile ranges are divided 
by more than two in average when the total number of genotyped individuals is increased from 
JxG = 500 to Jx G = 2000, and the distance of the outliers to the median (i.e., approximatively 
to the true value) is reduced. 


6 Discussion 


Using broadly-recognized population models based on a mechanistic description of individual 
movements (Preisler et al. 2004 Smouse et al. 2010), we have developed an approach to esti¬ 


mate the local effect of the environment on individual mobility, based on genetic data. In an 
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Fig. 6 Effect of number of traps and of the total number of genotyped individuals on the quality of the estimator 
D(x) in each of the three regions: Di in the matrix (in white), D 2 in the habitats (in green) and D 3 in the 
barrier (in grey). The blue dashed lines correspond to the true values of D in the three regions: Di = 0.01, 
D 2 = 0.005 and = 0.001. 
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environment made of three types of regions, each one associated with a different level of mo¬ 
bility - or diffusion ~ we successfully estimated the diffusion parameters D{x) in each region. 
The reaction-diffusion framework enabled a fast computation of expected population densities, 
making parameter estimation possible in a reasonable time. Genetic data had already proved 


their effectiveness in the estimation of dispersal kernels (Robledo-Arnuncio and Garcia 2007 


Klein et al. 2013) in more empirical models. Our results show that successful estimation of 


parameters of mechanistic population models is also possible using genotype measurements 
during dispersal and allele frequency data before dispersal, without the need of abundance or 
mark-recapture data. Genetic data lead to likelihoods of arriving from a given source, and are 
insensitive to the global population size (i.e., to the parameter a in our approach) and to the 
relative efhciency of the different traps (the parameters /3 t)- This advantage of working on 
probabilities of originating from the different sources has already been shown for kernel esti¬ 
mations ( Robledo-Arnuncio and Garcia||2007} Klein et ^[2013 ). A related approach, proposed 
by Ovaskainen et al. (2008) allowed to estimate the parameters of a diffusion model, based on 
mark-recapture data, with a single type of marks. In our framework, the genotype information, 
given the allele frequencies in the different habitats constituting our study-site, can be seen 
as mark-recapture data, with several types of marks (one per habitat) and some uncertainty 
on the marks of the captured individuals. Mark-recapture experiments with several types of 
marks should lead to good estimation results, as they would combine the advantages of our 
method (insensitivity to several parameters) and of traditional mark-recapture experiments (no 
unknown external sources, perfect knowledge of the frequencies). 

The genetic differentiation between the subpopulations corresponding to the different habi¬ 
tats of origin plays a key role in the quality of the estimation, as shown by the strong effect 
of the Fst, especially on the variability of the estimator. Low Fst values are associated with 
larger standard deviations and interquartile ranges, which can be explained by a lower pos¬ 
terior probability associated to the true habitat of origin of a genotyped individual. Defining 
the discrimination power of our genetic system as the average of this posterior probability, we 
could disentangle the effect of the Fgj^ and that of the other sources of uncertainty in our es¬ 
timator of the diffusion parameters D{x). With an intermediate level of genetic differentiation 
{Fst = 0.05), the discrimination power was high (0.9 with 20 traps and 2000 genotyped indi¬ 
viduals), and the quality of our estimator of D{x) was comparable to the case Fgr = 0.1. With 
an Fst of 0.1, the discrimination power was close to 1, which is almost exact; the remaining 
uncertainty in the determination of D{x) may therefore be sampling variance due to the finite 
number of genotyped individuals per trap. It could also be due to the lack of uniqueness in the 
inverse problem of determining D{x), even with inhnite population sizes. From a theoretical 
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viewpoint, the unique determination of diffusion and conductivity coefficients based on a finite 


set of measurements is a difficult problem, as illustrated by the Calderon problem (Calderon 


19801 of determining the electrical conductivity of a medium for which uniqueness is only proved 
with infinitely many observations ( Sylvester and Uhlmann|[l987 Nachman||1996 1. 

As expected, the size of the post-dispersal sample has a clear effect on the uncertainty of 
the estimation. The quality of the estimation, and especially its variability, is clearly improved 
when the number of genotyped individuals is increased. If the total number of genotyped loci 
and the number of traps were fixed, the trade-off between increasing the number of loci and 
the number of genotyped individuals per trap would depend on the main source of uncertainty: 
increasing the number of loci per individual increases the discrimination power while increasing 
the number of genotyped individuals per trap reduces the sampling variance. The role of the 
number of traps is less obvious. Intuitively, increasing the number of traps leads to a better 
coverage of the study site, which should have a positive impact on the estimation. However, 
with a hxed total number of genotyped individuals, this leads to a decrease in the number 
of genotyped individuals per trap and can therefore produce more uncertainty in the regions 
where the number of traps has not been increased (the habitats in our simulations). It should be 
noted however that too few traps may lead to identifiability problems, as would in the extreme 
case of a unique trap placed at equidistance between two habitats in an otherwise homogeneous 
landscape. 


Remarkably, the estimation of D{x) in the matrix remains accurate, with no outliers among 
the estimators of D{x) even with low Fst values. This may be the consequence of the larger 
area of the matrix compared to the other regions in the landscape leading to a stronger effect 
of the value of the diffusion parameter in this region. Conversely, larger standard deviations 
and interquartile ranges are observed in the habitats, for all Fst values. This cannot be fully 
explained by the smaller area of the habitat region, as the estimation on the barrier is more 
accurate, with a comparable area. Most of the individuals trapped in a given habitat come from 
the same habitat (about 90%). The remaining individuals being sparse, this leads to higher 
relative variance in the proportions of individuals trapped in the habitats than in the other 
regions, which can explain the lower accuracy of the estimation of the diffusion parameter in 
the habitats. Based on these observations, we suspect that the estimation of a single coefficient 
in a homogeneous environment would most likely be reliable, even with low Fst values, and that 
placing the traps far from the release sites should lead to a better estimation of the coefficient 
in such case. 


In addition to the Hardy-Weinberg equilibrium and independence of loci, an important 
assumption in our approach was that the allele frequencies were exactly known in the habitats. 
In practice, the frequencies are determined from previously sampled populations. The sample 
size is known to have an important effect on successful assignment of genotyped individuals 
(Cornuet et al. 19991. Reducing this size should lead to some uncertainty in the allele frequencies 
with an effect comparable to that of decreasing the Fst, he, lowering the discrimination power 


of the genotype data. A problematic case noted in Paetkau et al. (1995) and Cornuet et al. 


(1999) while studying assignment methods is when some individuals carry an allele which has 
not been detected in the sample corresponding to their population of origin, leading to a null 
posterior probability that these individuals come from their true habitat. In such case, |Paetkau| 


et al. (1995) suggest to add the genotype of these individuals to the population samples defining 


the allele frequencies in all of the habitats. 


In our study, the allele frequencies are determined before dispersal, in individuals from the 
same generation as the trapped individuals. At each generation, the allele frequencies are mod¬ 
ified, due to drift and gene flow among habitat patches. Thus, using allele frequencies sampled 
from previous generations could lead to an inaccuracy in the frequencies which depends on the 
gene flow and on the number of generations before the capture of the genotyped dispersers. 
To estimate the effect of the population flows on the allele frequencies after one generation, 
we computed the quantity C^jC-r, for t = h, corresponding to the proportion of individuals 
captured in 17^, whose habitat of origin is 17^. The values clearly depend on the proximity of 
another habitat, with a ratio of 0.98 in 17^, which is the farthest from the other habitats, and 
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of 0.82 in and which are close to each other (the other ratios are 0.87, 0.85 and 0.94 
in f? 2 , and respectively). Thus, the allele frequencies may remain stable after several 
generations if the habitats are sufficiently far from each other or become rapidly inaccurate in 
the opposite situation. A difficulty in estimating whether two habitats are far from each other 
is that the population flows are not known a priori. 

Our approach was based on an unbiased Brownian motion description of the individual tra¬ 
jectories, which leads to a Fokker-Planck reaction-diffusion equation. It can easily be extended 
to include a bias in a direction (ui, V 2 ){t, x) modelling attractiveness or repulsiveness of some el¬ 
ements of the landscape, by adding a term —d{u vi)/dxi — d(uv 2 ) / dx 2 in the reaction-diffusion 
equation (2.2). The method proposed here also readily applies to other linear dispersal terms, 
such as the Fickian diffusion V • {D{x)Wu) which is nevertheless more adapted to describe elec¬ 
tric and thermic conductivity ( Roques et al.||20d8 Roques|[2013 1. More general Levy processes 
than Brownian diffusion, corresponding to movements with large jumps, could have been con¬ 
sidered as well; in such cases, the Laplace diffusion operator A would be replaced by a fractional 
Laplace operator ( Valdino^|2009 1. Integral kernel-based dispersal terms which can account for 


long distance dispersal events could also be considered in place of the diffusion approach (Kot 

eTatlllgMl ). 


The purpose of our study was to propose a rigourous method for the estimation of dispersal 
parameters in stochastic differential equation-based models of individual movement. Here, the 
method performance analysis assumed the same type of model for simulations and for inference. 
In practice, the real dispersal process may differ from the assumptions of the model. Future 
work should focus on the robustness of the method when the model assumptions are violated. 

Another possible extension of our study is to include the estimation of the relative pre¬ 
dispersal densities in the different habitats of origin and/or the areas of these habitats. It is not 
straightforward however that these problems are identifiable. 

As suggested by Remarkj^ pre-dispersal subpopulations may not coincide with the habitats. 
The location of the subpopulations may also not be known a priori. A challenging extension of 
our approach would consist in clustering the subpopulations and inferring the allelic frequencies 
(following a Structure-like approach, see Pritchard et al.pOOO ) together with the estimation of 
D{x) in a full Bayesian approach, based on pre-dispersal genotype data in the habitats and 
genotype data of trapped dispersers. 


Appendix A: gradual release of the pre-dispersal populations 


The equation (2.2) describes a simultaneous release of all the individuals at t = 0. To account 
for a possible gradual release of the individuals, the equation (2.2) can be replaced by: 


du zi 

— = A{D{x) u) - -+ uo(x) fit), t> 0, X € f2, 


(6.26) 


where the term uoix)/(t) describes the release of the individuals; uo{x) still corresponds to 
the pre-dispersal density and the function /(t) is the release rate. It can be described by any 
nonnegative function or distribution with integral 1 and with support in [0, T], T corresponding 
to the end of the release period. In this framework, the density of dispersers coming from habitat 
17^ satisfies the equation: 


F)i / ^ 

— = AiD{x)u'^) 
where Uq is still given by \2.7\. 


- —I- uq(x) fit), t > 0, X € n. 


(6.27) 


Appendix B: precise shape of the diffusion terms 

In our numerical computations, we took 

ifix) = H 2 Ri\\x\\) and tpix) = tpixi,X 2 ) = IJLr ixi - q), 
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for the function ^ defined by (see Fig. [^: 


= exp 


(r2 - R-^y 


for r S {—R, R) and ^^(r) = 0 otherwize. 



Fig. 7 The function for R = 0.05 and r S (—0.1,0.!). 


Appendix C: computation of the Fst 


The index Fst is used as a measure of genetic differentiation among the subpopulations. It was 
computed as follows: we set 


= z E E £ ^ 1E E E 

\=la=lh=l ' - ' - 


X=1 a=l 


h=l 


where A is the number of loci, A, the number of alleles per locus whose frequency is measured 
and H the number of subpopulations, and 


Fst 


Hs — Ht 
1 - Ht 


(6.28) 


This formula corresponds to Nei’s Gst for a single locus (Nei 19731, with numerator and 
denominator averaged over the A loci. In our computations, all the subpopulations had the 
same size; in other situations, the weight l/H in the above formulas for Hs and Ht should be 
replaced by the relative sizes of the subpopulations. 


Appendix D: numerical computation of the cumulated population densities 

In order to compute the cumulated densities Woo{x) and w^{x), we used the time-dependent 
partial differential equation solver Comsol Multiphysics® applied to the evolution equations 
(6.30) and (6.32) below at large time {t = 20), with default parameter values (finite element 
method with second order basis elements) and a triangular mesh adapted to the geometry of 
our landscape and made of 5296 elements. 

We defined the cumulated population density at intermediate times t and position x by: 


Wt{x) = / u{s,x) ds, for alH > 0, x G Q. 

Jo 


(6.29) 


Integrating (2.2) between 0 and t > 0 we note that Wt{x) satisfies the following equation: 

^ = A{D{x) wt)- — + uo(x), t>0, X G [2, 
at v 


(6.30) 
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and wo(x) = 0. 

Similarly, the cumulated population density of individuals coming from is: 

w^(x) = / u^{s,x) ds, for alH > 0, x G fl. 

Jo 


This function satisfies: 

= A{D{x) w^) — — + v!^{x), t > 0, a; e 17, 


and Wq(x) = 0. 
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